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Abstract 

Cognitive skills are encoded by a set of productions, which are organized according 
to a hierarchical goal structure. People solve problems m new domains by applying weak 
problem-solving procedures to declarative knowledge ;hey have about this domam. From 
these Initial problem solutions, production rules are compiled which are specific to that 
domain and that use of the knowledge. Numerous experimental results follow from this 
conception of skill organization and skill acquisition. These experiments include predictions 
about transfer among skills, differential improvement on problem types, effects of working 
memory limitations, and applications to instruction. The theory implies that all variety of skill 
acquisition, including that typically regarded as inductive, conforms to this charactenzation. 
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Research on the acquisition of cognitive skills has received a great deal of recent attention, 
both within the psychological and the artificial intelligence literature (Anderson. 1981. 1982. 
1983: Brown & VanLehn. 1980; Carbonell. 1983: Chi. Glaser. & Farr. in press; Kieras & 
Bovair. 1985; L-^Jrd. Rosenbloom. & Newell. 1984. Langley, 1982. Larkin. McDermott. Simon, 
and Simon, 1980; Lesgold. 1984; Newell & Rosenbloom, 1981, Van Lehn. 1983) , There are 
a number of factors motivating this surge of attention. First, there is increasing evidence 
that the structure of cognition changes from domain to domain and that behavior changes 
qualitatively as experience increases in a domain. This shows up both in comparisons 
between novices and experts within a domain (Adelson. 198' Chase & Simon. 1973. Chi et 
a!., in press; Jeffries. Turner. Poison. & Atwood. 1981, Larkin et al.. 1980; Lesgold. 1984) 
and the failure of descriptions of cognition to be maintained across domains (Anderson. 
1985. Ch. 9; Cheng, Holyoak. & Nisbett. in press) . For instance, expert problem-solving in 
physics involves reasoning forward to the goal, whereas problem-solving m , programming 
involves reasoning backv^ard from the goal. In deveioping a learning theory one is striving 
for the level of generality that will unify these diverse phenomena. The belief is that learning 
principles will show how the novice becomes the expert and how the structure of different 
problem domains is mapped into different behavior. Basically, the goal is to use a learning 
theory to account for differences in behavior by differences in experience. 

A complementary observation to the first is that theones of cognition for a particular 
domain are not sufficiently constrained (Anderson, 1976, 1983). Ther^ are multiple 
theoretical frameworks for accounting for a particular performance and the performance 4seif 
does not allow for an adequate basis for choosmg among the accounts. A learning theory 
places an enormous constraint on these theoretical accounts because the theoretical 
structure proposed to encode the domain knowledge underlying the performance must be 
capable of being acquired. This is like the long-standing claim that a learning theory would 
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provide important constraints on a linguistic theory (for recent efforts see McWhinney, in 
press: Pinker. 1984; Wexler and Culicover. 1980). 

While there may be serious questions of the uniqueness of the theoretical accounts 
of cognition that have been advanced, there is little question that the accounts have become 
increasingly more sophisticated and cover much more complex phenomena. This success at 
creating accounts of various complex domain behaviors has emboldened the effort to strive 
for a learning account. Now that we have the theoretical machinery to account for complex 
cognition, we have frameworks in which the learning question can be addressed. This 
optimism nas also been fueled by work, largely in artificial intelligence (Hayes*Roth & 
McDermott. 1976. Lewis. 1978; Michalski. 1983; Michalski. Carbonell. & Mitchell, 1983; 
Mitchell, 1982, Vere. 1977) studying mechanisms for knowledge acquisition. The psychological 
research on skill acquisition (Klahr. Langley, & Neches. somewhere) often takes the form of 
trying to apply these abstract concepts from Al to the theoretical mechanisms that have 
evolved to account for the realities of human cognition. 

Another motivation for interest in skill acquisition is that it seems many of the 
fundamental issues m the study of human cognition turn on accounts of how new human 
competences are acquired. For instance, the faculty position, which claims that language 
and possibly other higher cognitive functions are based on special principles (Chomsky. 
1980. Fodor. 1983), seems to rest on claims that different cognitive competencies are 
acquired m different manners. For instance, it is claimed that language acquisition depends 
on using "universals of language" to restrict the induction problem. As another example, 
when we compare ''symbolic" (Newell & Rosenbloom. 1981. VanLehn. 1983) versus "neural" 
m'^ > of cognition (Rumelhart & Zipser. 1985. McClelland. 1985. AcKley. Hinton. & 
Sejnowski. 1985). we see that one of the few places where they maka fundamentally 
different (as opposed to complementary) claims is with respect to how knowledge is 
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acquired. Many researchers in various theoretical camps are working on learning because 
they realize that a prerequisite to advancing their grander claims about the nature of 
cognition is working out the details about how knowledge is acquired in thetr framework and 
substantiating these details. 

^ The ACT* theory of Skill Acquisition 

The ACT' theory of skill acquisition (Anderson. 1982. 1983) was developed to meet a 
set of three constraints. First, the theory had to function within an existing theory of 
cognitive performance, the ACT* production system model of human cognition- which had its 
own considerable independent motivation. Second, the theoi-y had to meet sufficiency 
conditions, which is to say that it had to be capable of acquiring the full range of cognitive 
skills that humans acquire under the same circumstances that they acquire it. Third, it had 
to meet necessity conditions, which is to say that it had to be consistent with what was 
known about learning from various empirical studies. 

At the time of development of the ACT' theory, the existing data base fell seriously 
short of testing the full range of complex predictions that could be derived from the learning 
theory. The theory applies to acquisition of elaborate skills over long time courses. For 
obvious reasons, there is a dearth of such research. Moreover, much of the existing 
research only reports gross descriptive statistics (e.g.. whether "discovery" learning or 
"guided" learning is better), rather than analyses at the level of the detail addressed by the 
ACT theory. We had a few detailed protocol studies of skill acquisition over the course of 
tens of hours, but these studies can be criticized for the small sample size (ofien 1) and 
the subjectlveness of the data interpretation. 

In response to this we have set out on a course of research looking at the 
acquisition of complex mathematical and technical skills taught m various sectors of our 
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socieiy-eg,. geometry, programming, and ttxt-edlting. The most ambitious line of research 
IS the development of computer-based tutors for large-scale and objective data collection 
(Anderson. Boyle. Farrell. & Reiser. 1984, Anderson, Boyle, & Reiser, 1985). Most of the 
results from this research line nave yet to come m. On a less ambitious scale, we have 
performed laboratory analysis of these skills and their acquisition. Most of the research to 
be reviewed in this paper comes from this second line of research. 

This paper has throe goals-one is to set forth some general claims about the course 
of skill acquisition. The second goal is to discuss a series of relatively counterintuitive 
predictions to be derived from the ACT* theory and to review the state of empirical 
evidence relevent to these predictions. The third goal \s to report some revisions to that 
theory in response to both the empirical data and further theoretical considerations. 
However, before attempting any of these it is necessary to review the ACT' theory of skill 
acquisition and its empirical connection, I will do this with respect to the domain of writing 
LISP programs, where the theory has had its most extensive application (Anderson. Farrell. 
and Sauers, 1984) . The examples are deliberately chosen to use only the most basic 
concepts of LISP so that lack of prior knowledge of LISP will not be a barrier to 
understanding the points. 

Acquisition of LISP programming skills 

Cognitive processing in the ACT theory occurs as the result of the firing of 
productions. Productions are condition-action pairs which specify that if a certain state 
occurs in working memory, then particular mental (and possibly phySicat) actions should take 
place. Below are two ''Englishified" versions of productions that are used m our simulation 

of LISP programming: 

PI: IF the goal is to write a solution to a problem 

and there is an example of a solution to a similar problem 
THEN set a goal to map that template to the current case. 
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P2: 



IF the goal is to get the first element of Listi 
THEN write (CAR Listi) 



The first production, PI. is one of a number of productions for achieving structural 
analogy. It looks for a similar problem with a worked-out solution. If such an example 
problem exists, this production wilt fire and propose using this example as an analog for 
srVing the current problem. This is a domain-general production and executes, for instance, 
when we use last-year's income tax forms as models for this year's income tax forms. In 
the domain of LISP , it helps implement the very common strategy of using one LISP 
program as a model for creating another. 

The second production rule. P2. above is one that is specific to LISP and recognizes 
the applicability of CAR. one of the most basic of LISP functions, which gets the first 
element of a list. For instance/ (CAR *{a b c)) = a. One important question a learning 
theory has to address is how does a system, starting out with only domain-general 
productions like the first, acquire domain-specific productions like the second.^ 

To illustrate both how productions like the above are combined to solve a problem 
and how domain-specific productions are acquired from domain-general productions, it iS 
useful to examine a subject's protocol when first learning to define new functions in LISP. 
Defining functions is the principle means for creating LISP programs. The subject (BR) had 
spent approximately five hours before this point practicing using existing functions in LISP 
and becoming familiar with variables and list structures. 



BR had just finished reading the instruction in Winston and Horn (1981) on how to 
define functions. The only things she made reference to m trying to solve the problem 
were a template of what a function definition looked like and some examples m the text of 
what function definitions looked like. The template and an example of one of the functions 
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She looked at are given below. The fcnction is called F-TO-C, and it converts a 

temperature In fahrenheit to centigrade: 

(DEFUN <function name> 

(<parameter 1> <parameter 2> ... < parameter n>) 
< process description >) 

(DEFUN F.TO-C (TEMP) 

(QUOTIENT (DIFFERENCE TEIV1P 32) 1.8)) 

The problem she was given was td write a LISP definition that would create a new 
function called FIRST which would return the first element of a list. As already noted, there 
IS a function that does this, called CAR. which comes with LISP. Thus, she is creating a 
redundant function, and this problem is really just an exercise in the syntax of function 
definitions. 

Anderson, farrell. and Sauers (1984) report a simulation of BRs protocol that 
perfectly reproduces her ma|or steps. The production set behind this simulation produces 
the goal structure, in Figure 1 which is useful for explaining BR s behavior. She starts out 
selecting the template as an analog for building a production. A set of domain-general 
productions for domg analogy then try to use this template. Subgoals are set to map each 
of major components of the template. Knov/ing "DEFUN" is a special LISP function she 
writes this first and then she writes "FIRST" which is the name of the function. 

Insert Figure 1 about here 



She has trouble m deciding hov/ to map the structure parameter 1>< parameter 
2>,,.<p8ram(/ter n>)*' because she has no idea //hat parameters are. However, she looks 
at the concrete example of F-TO-C and sees that there ia just the argument to the function 
and correctly infers she should put a variable name that .vill be the argument to FIRST, 
which she chooses to call LIST1. 

^ 10 
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Then she turns to trying to map < process-description > which she again cannot 
understand but sees in an example that this is just the LISP code that calculates what the 
function is supposed to do. and so she writes CAR. which gets the first element of a list. 
We have seen above a LISP production that generates CAR. This is the only place in the 
original. coding of the function where a LlSP-specific production fires. We assume it was 
acquired from her earlier expeiience with LISP. To review, hf/r code at this time is. 

(DEFUN FIRST (LIST1) (CAR 

The major hurdle still remains in this problem, to write the argument to the function. 
She again looks to the example for guidance about how to code the argument to function 
within a function definition. She notes that the first argument in an example lil<e F-TO-C is 
contained m parentheseS"(DIFFERENCE TEMP 32). This is because the argument is itself 
a function call and function calls must be placed in parentheses. She does not need 
parentheses in defining FIRST where she simply wants to provide the variable LIST1 as the 
argument. However, she does not recognize the distinction between her situation and the 
example. She places .her argument in parentheses producing a complete definition. 

(DEFUN FIRST (LIST1) (CAR (I'STI))) 

When the argument to a function is a list. LISP attempts to treat the first thing 
inside he list as a function definition. Therefore when she tests FIRST she gets the the 
error message "Undefined function objGct« LISTT In the pas: she had corrected the error 
by quoting the offending list. So she produces the patch. 

(DEFUN FiRST (LIST1) (CAR (LISTl)})^ 

When she tests this function out on a list lil<e (A B C) she got the answer LIST1 
rather than A because LISP now returns the first item of the literal list (LISTI). Eventually 
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after some work she finally produces the correct code which is: 
(DEFUN FIRST (UST1) (CAR UST1)) 

We have observed 38 subjects solve this problem In the LISP tutor (Anderson & 
Reiser, 1985) and a number of further subjects in informal experiments. BR*s solution is 
typical of novice probIem-soIvir*j in many ways. The two places where she has difficulty, 
specifying the function parameters and specifying the argument to CAR. are the two major 
points of difficulty for our LISP tutor subjects. Her error made in specifying the argument 
to CAR IS made by over half of the tutor subjects. An interesting informal observation that 
we have made is that people with no background at all In LISP (i.e., not working with the 
tutot), given information about what CAR does, given the function definition template, and 
given the F-TO-C example, tend also to solve the problem m the same way as BR and tend 
to make the same first error. Thus, it seems that much of the probler-solving is controlled 
by analogy and not by detailed understanding of LISP. The success of our simulation also 
suggests that this problem-solving by analogy can be well modelled by a production system 
with hierarchical goal structure. 

Knowledge Compilation 

Our simulation of BR after producing the solution to this problem createa two new 
productions which summarized much of the solution process, it does this by a knov</ledge 
compilation process (described m Anderson. 1982. 1983) that collapses sequences of 
productions into single productions that have the same effect as the sequence. Typically, 
as m this case, it converts problem-solving by domam-general productions into domam* 

specific productions. The two productions acquired by ACT are given below: 

CI: IF the goal is to write a function of one variable 

THEN write (DEFUN function (variable) 

and set as a subgoal to code the relation calculated 
by this function 
and then write ). 
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C2: 



IF the goal is to code an argument 

and that argument corresponds to a variable of the function 
THEN write the variable name. 



Figure 1 indicates the set of productions that were collapsed to produce each of the 
productions above. The first production summarizes the analogy process that went into 
figuring cut the syntax of a function definition, and we now have a production which directly 



prt>di*ve,3 ihai syntax without reference to the analog. The second production summarizes 
the search that went into fincfing the correct argument to the function and directly produces 
that a/gument. 

We gave our simulation another LISP problem to solve, armed with these two 
additional productions. This was to write a function called SECOND which retrieved the 
second element of a list. Although SECOND is a more complex function definition, both the 
simulation and our subject produced very much more rapid and successful solutions to this 
problem. The speed up in the simulation was due to the fact that fewer productions were 
involved thanks to the compiled productions. 

One feature of this knowledge compilation process is that it predicts a marked 
improvement as we go from a first to a second problem of the same kind. Elsewhere 
(Anderson. 1982). we have commented upon this marked speed-up in the domain of 
geometry proof generation. We have found the same thing m studies with the LISP tutor 
where students first code FIRST and then SECOND. We find that errors m the function 
definition syntax (compiled mto CI above) drops from a median of 2 in FIRST to a median 
of 0 errors in SECOND, and the time to type in the function template type drops from a 
median of 237 seconds to a median of 96 seconds. Success at specifying the variable 
argument (compiled into C2 above) changes from a median of 3 errors to a median of 0 
errors, time to enter this argument successfully drops from a median of 96 seconds to a 
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median of 26 seconds. By any measure these are extremely impressive one-trial learning 
statistics. 

Important Features of the ACT theory 

This simulation of BR illustrates a number of important features of the ACT' theory 
that will serve as the basis for the predictions and empirical tests to be reported m the next 
section of the paper and the theoretical analyses offered m the last section of the paper. 
These features are: 

1. Productions as the Units of Procedural Knowledge . The ma|or presupposition of 
the entire ACT theory and certainly key to the theory of skill acquisition is the idea that 
productions form the units of knowledge. Productions define the steps in which a problem 
is solved and are the units in which knowledge is acquired. 

2. Hierarchical Goal Structure . The ACT production system specifies a hierarchical 
goal structure that organizes the problem solving. Such a hierarchical goal structure is 
illustrated in Figure 1. This goal structure is an additional control construct that was not 
found in many of the ongmal production system models (Anderson. 1976. Newell. 1973) but 
now is becoming quite popular (e.g.. Brown & VanLehn. 1980. Laird, et al.. 1984). It has 
proven impossible to develop satisfactory cognitive models that do not have some overall 
sense of direction in their behavior. As can be seen with respect to this example, the 
hierarchical goal structure closely reflects the hierarchical structure of the problem. More 
important than their rule in controlling behavior, goals are important to structuring the 
learning by knowledge compilation. They serve to indicate which parts of the problem 
solution belong together and can be compiled into a new production. 

3. Initial Use of Weak Methods . This simulation mcely illustrates the cntical role that 
analogy to examples plays in getting initial performance off the ground. There ts a serious 
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question about how, before a student has any productions specific to performing a task, the 
student can do the task. We see in this example that the student can in fact mimic the 
structure of a previous solution. However, this simulation also shows that, m contradiction to 
frequent characterizations of imitation as mindless {e.g.. Fodor. Bever. & Garrett. 1974), the 
analogy mechanisms that implement this process of imitation can be quite sophisticated. 
For other discussions fo use of analogy m problem-solving see Kling (1971). Carbonell 
(1983). and Winston (1979). 

While analogy is a very frequent method for getting started in problem-solving, we do 
not think that it is the only way. It is an instance of what are referred to as weak 
problem-solving methods (Newell. 1969) . which have as their characteristic that they are 
general and can apply in any domain. They are called weak because they do no! take 
advantage of domain characteristics. Other examples of weak problem-solving methods 
include means-ends analysis and pure forward search. 

4. Knowledge Compilation . All knowledge in the ACT theory starts out in declarative 
form and must be converted to procedural (production) form. This declarative knowledge 
can be encodings of examples of instructions, of general properties of objects, etc. The 
weak problem-solving methods are the ones that can apply to the knowledge while it is m 
declarative form and interpret its implications for performance. The actual form of the 
declarative knowledge determines the weak method adopted. For instance, m the simulation 
of BR analogy was used because the declarative knowledge was almost exclusively 



knowledge about the template, example programs, and knowledge about the symbols used 
therein. In a geometry simulation discussed in Anderson (1982) the c- . 'ative knowledge 
largely came in the form of rules te g.. "A reason for a statement can either be that it is 
given, or that it can be derived by means of a definition, postulate, or theorem"). This 
tended to evoke a working backwards problem-solving method. 
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When ACT solves a problem tt produces a hierarchical problem-solution generated by 
productions. Knowledge compilation is the process that creates efficient domain-specific 
productions from this trace of the problem-solving episode. The goal structure is critical to 
the knowledge compilation process m that it indicates which steps of the original solui.on 
belong together. The Appendix to this paper contains a more technical specification of 
kncNvledge compilation and the role of goals in this process. 

Outlined above is a complete theory of how new skills arb acquired; Knowledge 
comes m declarative form, is used by weak methods to generate problem-solutions, and the 
knowledge compilation process forms new productions. The key step is the knowledge 
compilation process, which produces the domain-specific skill. 

While the four points above are the key features to understand the origin of new 
skills, they dc not address the question of how well the knowledge underlying the skill will 
be translated ^ Ao performance. There are two factors determining the success of execution 
of productions. These might be viewed as performance factors that modulate the 
manifestation of learning except, as we will see. these factors can also be improved with 
learning. 

5. Strength . The strength of a production determines how rapidly it applies, and 
production rules accumulate strength as they successfully apply. While accumulation of 
strength is a very simple learning mechanism, there is good reason., tp, belie 'e that 
strengthening is often v/hat determines the rate of skill development m the limit. Ti'e ACT 
strengthening mechanism predicts the typical power-function, shape of speed-up .n skill 
performance (Anderson. 1982). 

6. Working Memory Limilations . In the ACT theory there are two reasons why errors 
are made. The productions are wrong, or the information m working memory on which they 
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operate is wrong. However, the important observation terms of qerrormance lim Jons is 
that perfect production sets can display errors due to working memory failures, in fact, the 
only way ''slips*' can occur in the ACT theory is because critical information is lost from 
working memory and consequently the wrong production fires or the right production fires 
but produces the wrong result. Just as learning impacts on production strength, it seems 
learning impacts on working memory errors. Working memory capacity for a domain can 
increase (Chase & Ericsson, 1982) , reducing the number of such errors with expertise. 

The SIX features reviewed above lead to a number of interesting consequences when 
we begin to analyze how they interact within the framework of tho ACT theory. The 
remainder of this paper is devoted to exploring these consequences. 

Transfer 

The commitment to productions a? the units of procedural knowledge has some 
interesting empirical consequences. In particular It le<.ds to some strong predictions about 
the nature of transfer between skills. Specifically, the prediction is that there will be positive 
transfer between skills to the extent that the two skills involve the same productions. This 
is a variation of Thorndike's (1903) identical elements theory of transfer. Thorndike argued 
that there would be transfer between two skills to the extent that they involved the same 
content. Thorndike was a little vague on exactly what was meant by content, but he has 
been interpreted to mean something like stimulus-response pairs lOrata. 1928). The ACT 
theory offers a more abstract concept, the production, to replace the more concrete 
concept, the stimulus-response pair. In ACT the abstraction comes m two ways. First, the 
productions can be vanablized and hence not refer to specific elements. Second, there is a 
hierarchical goal structure cont.'^olling behavior, and many of the productions are concerned 
with generating this goal structure rather than specific behavior. 
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Unfortunately, there is a serious problem in putting the prediction about production- 
based transfer to test because it depends on agreement as to vvhat the productions are that 
underlie two tasks. There is always the danger of fashioning the production system models 
to fit the observed degree of transfer. Therefore, it is important to select tasks where there 
is already independent evidence as to the appropriate production system analysis. Singley 
and Anderson (in press) chose texi editing because there already existed production-system 
models or production-system-like models of this task ^Card, Moran, and Newell, 1983, Kieras 
& Poison, in press). While these are not ACT production systems, they serve to nearly 
completely constrain how one would produce ACT production system models for these tasks. 

Using the Card. Moran. and Newell model as a guideline, the first production in 
performing a text edit is: 

EO: IF the goal is to edit a manuscript 

THEN set as subgoals 

1. To characterize the next edit to perform 

2. To perform the edit 

The first subgoal sets up the process of scanning for the next thing to be changed and 
encoding that change. The assumption is that this process should be common to all 
editors. The actual performance of the edit can vary from editor to editor. For instance, m 
the line editor ED. the following productions would fire among others to replace one word 
by another: 

El: IF the goal is to perform an edit 

THEN set as subgoals 

1. To locate the line 

2. To type the edit 

E2: IF the goal is to locate the target line 

and the current line is not the target line 
THEN increment the line 

E3: IF the goal is to locate the target line 

and the current line is the target line 
THEN POP success 
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THEN 



set as subgoals 

1, To choose the arguments 

2. To type the command 



E5: 



IF 

THEN 



the goal is to type the command 
set as subgoals 

1. To type the command name 

2. To type the arguments 



E6: 



IF 

THEN 



the goal is to type the command name for substitution 
type S 



Figure 2 illustrates the goal structure of these productions. The triangles in Figure 2 
indicate goals that would be achieved by productions other than those not spec:fied above. 



We studied transfer to two other editors-EQT and EMACS. EDT has different 
command namos than ED and. m the subset of EDT that we used, subjects had to achieve 
the goal of locating a line by specifying a search string, rather than incrementing the line 
number. Otherwise, the goal structure is identical for ED and EDT. EMACS, on the other 
hand, is identical only in the top-level production ano the process of characterizing what 
needs to be done. It is a screen editor rather than a line editor and so has very different 
means of making edits. Figure 2 illustrates where the two editors are similar and where 
they are different. From this analysis we made the following two predictions about transfer 
among editors based on the production overlap; 

1. There should be large positive transfer betv;een ED and EDT. The failure of 
transfer should be localized mamiy to the components associated with locating imes where 
the two differ substantially m terms of productions. This prediction of high positive transfer 
holds despite the fact that the actual stream of characters typed is quite different m the two 
editors. What is important is that the two editors have nearly identical goal structures. So 
it violates Thorndike's surface interpretation of the identical elements principle. 



Insert Figure 2 about here 
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2. There should be little positive transfer from either ED or EOT to EMACS despite 
the fact that they are both text editors. Moreover, what positive transfer there is should be 
largely confined to the process of characterizing the edit rather than executing it. 

The experiment fiad five experimental groups. One practiced EMACS for six days. 
This will be referred to as the one-editor group, A second group practiced ED for four 
days and then learned and practiced EMACS for the last two, A th-rd group practiced ED"^ 
for the first four days and then transfered to EMACS for the last two days. Groups 2 and 
3 above wiil be referred to as the two-editor groups, A fourth group practiced ED for two 
days. EDT for two more, and EMACS for the last two, A fifth group practiced EDT for two 
days. ED for two more, and EMACS for the last two. These groups will be called the 
^ three-editor groups. Note that all groups worked in EMACS for the last two days of the 
experiment. 

The results are presented in Figure 3 in terms of time to perform an edit. The first, 
uninteresting, thing to note is that EMACS is a good deal more efficient than the line 
editors-that is. the one-editor group is much faster at the beginning when they are using 
EMACS than the other groups who are using a line editor. The second, significant, 
obser/aiion is that there is virtually no difference between the two and three editor groups 
In particular, on Day 3 when the three editor subjects are iransfenng to a new editor they 
are almost as fast as the two editor subjects who are using that editor for the third day 
Thus, we have almost total positive transfer as predicted. Moreover, when we examined the 
microstructure of the times, we found that the only place the three-editor subjects were at a 
Significant disadvantage to the two-editor subjects was m line locatior^. as predicted. 

Insert Figure 3 about here 



In contrast, when any of the line-editor groups transfers to EMACS on Day 5. they 
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are at a considerable disadvantage to subjects who have been practicing EMACS aii along. 
They are faster than the EMACS-only subjects were on Day 1. We would attribute this 
advantage to subjects' ability to charactenze the edits they have to perform. Consistent with 
this theory, we found nearly as . much transfer to EMACS in a control group who spent four 
days simply retyping an edited text. Presumably, this group was also learning how to 
interpret edits marked on the page. Thus, we see that if we take a production system 
analysis quite seriously, we are able to apply an identical eiemenl metre to predict iransfer. 
Singley and Anderson (in preparation) should be consulted for a very fine-grained analysis of 
the data from this experiment in terms of a production system analysis. There is reason to 
believe that production system analyses may prove to be generally useful in predicting 
transfer. Quite independently. Kieras & Bovair (1985) and Poison & Kieras (1985) have 
found similar success using a production system analysis to predict transfer among a 
different set of skills. 

Negative transfer among cognitive skills? 

There is a peculiar consequence of the identical elements model which is that it 
does not directly predict any negative transfer among skills m the sense of one procedure 
running less effectively because another has been learned. The worst possible case is that 
there is zero overlap among the productions that unclerlie a skill m which case there will be 
no transfer, positive or negative. It is possible to get negative transfer defined m terms of 
overall measure like total time or success if a procedure optimal in one situation ts 
transferred :o another domain where *t is not optimal, as m the Emstellung phenomenon 
(Luchins & Luchins. 359). However, in our analyses of the Einstellung effect we would 
assume perfect transfer of the production set. It is a case of perfect transfer of 
productions leading to very non-optimai performance, not a case of the productions firing 
more slowly or in a more errorful way. 
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Interference or negative transfer is quite common with declarative knowledge, but it 
can be quite a bit more difficult to get in the domain of skills (Anderson. 1985. Ch. 9). 
suggesting a basic difference between procedural and declarative knowledge, Singiey and ! 
decided to create in the text editor domain what would be a classic interference design in 
the verbal learning domain. We created an editor called Perverse EMACS. which was just 
iike EMACS except that the assignment of keys to function was essentially permuted. So 
for instance, m EMACS control-D erases a letter, escape-D a word, and control-N goes 
down a line while in Perverse EMACS control-D went down a line, while escape-R erased a 
letter. If we assume that the functionalities are the stimuli and the keys the responses we 
have an A-B.A-Br interference paradigm in paired-associate terms (Postman, 1971). 

We compared two groups of subjects. One spent six days with EMACS while the 
other spent their first two days with EMACS. then the next two days with Perverse EMACS 
and the final two days with EMACS again. Figure 4 shows the results of this experiment. 
Throughout the experiment, including on the first two days where both groups of subjects 
are learning EMACS. the Perverse EMACS transfer subjects are slightly worse. However, the 
only day they are significantly worse is on the third day when they transfer to Perverse 
EMACS. That difference largely disappears by the fourth day when they are still working 
with Perverse EMACS. Compared to Day 1 on EMACS. there is large postive transfer on 
Day 3 to Perverse EMACS. reflecting the production overlap. The difference betv/een the 
two groups on Day 3 reflects the cost of learning the specific rules of Perverse EMACS 
When the transfer subjects went back to EMACS on Day 5 they picked up on the same 
point on the learning curve as had the subjects who had stayed M\h EMACS all through 
(While they are slov;er than the pure EMACS subjects in Days 5 and 6. they are no more 
slow than they were on Day 2 when they last used EMACS.) This is because they had 
been practicing m Perverse EMACS largely the same productions as they would be using in 
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regular EMAGS. 



Insert Figure 4 about here 



In conclusion, this research on text editors supports the claim that production systems 
should be viewed senously as analyses of procedural knowledge. We see that transfer can 

« 

be predicted by measuring production overlap (for actual detatle ' " asures see Singley & 
Anderson, in preparation). We also see that procedural knowledge behaves differently than 
declarative knowledge in that it does not seem to be subject to principles of interference. 

Effects of Knowledge Compilation 

There are a number of predictions that follow from the knowledge compilation 
process, which is the basic production learning mechanism in the ACT' theory. This 
process collapses sequences of productions mto single operators. If one repeatedly presents 
problems that require the same combination of productions, the subject should compose 
productions to reflect that combination and strengthen that production. This implies that 
frequently co-occurring combinations of operations should gain an advantage over less 
frequently co-occurring combinations of u ame operations. 

Table 1 illustrates part of an Experiment by McKendree and Anderson (in pressi that 
was designed to test this prediction. We were looking at subjects ability to evaluate various 
combinations of LISP expressions. For instance, subjects had to evaluate (CAR ;CDR {(A 
B){C D)(E F)))). CDR returns all but the first element of a Ust; so iCDR ((A B1(C DME F))) 
is ((C D){E F)). The function CAR will return the first element of this or iC D) which is the 
correct answer to the original problem. We did the experiment to see if subjects would 
learn operators to evaluate combinations of functions such as this CAR-CDR combination. 

Insert Table 1 about here 
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We had subjects study various combinations of functions with differential frequency. 
Table 1 gives how frequently subjects would see combinations of each function each day. (It 
IS the actual function combination that was seen with the given frequency, we had subjects 
apply that function combination to new arguments each time,) Some function combinations 
were seen ;hree times as frequently as were other function combinations. As indicated in 
Table 1 we counterbalanced over two groups which combinations were the high and which 
were the iow frequency combinations. While the frequency of combination varies between 
the two groups, the actual frequency of individual functions does not. Thus, if we see an 
advantage for a combination, we will know that the advantage fS m fact for that combination 
not the individual functions. 

The knowledge compilation process predicts a speed advantage in evaluating high- 
frequency combinations because of two 'actors. First, the compiled productions for the 
high-frequency combinations are likely to be learned sooner,. Second, once learned they will 
acquire strength more rapidly because of their greater frequency. 

The experiment involved presenting students with these function combinations, 
combinations of other pairs of LISP functions that similarly varied in frequency, problems that 
required that subjects just evaluate smgle LISP functions, and problems that required that 
Subjects evaluate triples of functions. Subjects were given four days of practice in which 
the frequencies of the pairs were maintained as m Table 1. We measured subjects' time to 
type solutions into the computer for these problems. Averaged over the four days subjects 
took 6.55 seconds to evaluate the high frequency combinations and 8.61 seconds for the 
low frequency combinations. This difference iS predicted by the assumptions that subjects 
are compiling productions to reflect vanous function combinations and are strengthening 
these compiled productions according to the frequency of their applicability. 
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This also shows that the development of skill is highly specific to its use. That is. 
even though the subject may be good at evaluating the combinations car<.dr and cdr^car. 
the subiect may be poor at evaluating the combinations car-car and cdr-car, despite the fact 
that these evaluations involve the same primitive knowledge as the first two. 

Use-Specificity of Knowledge 

* Declarative knowledge is flexible and not committed to how it will be used. However. 

often Knowledge compilation will derive productions from that declarative knowledge which 
can only be used m cenam ways. Often the production sets underlying different uses of 
the same knowledge can be quite different. For instance, productions for language 
comprehension would be quite different from productions fee language generation. 

There iS no reason to predict transfer between different uses of the same knowledge. 
Neves and Anderson (1981). for instance, compered subjects ability to generate a proof in a 
logic-like system with their ability to give the reasons that justified the steps of a worked out 
proof. We were interested m this comparison because even though these two tasks make 
use of the same logical postulates, there is no overlap in the production system that 
implements one skill versus another. Generating a proof involves search for a path between 
givens of the problem and the statement to be proven, while reason-giving involves a search 
through the rule space iooKing for a rule whose consequence matches the statement to be 
justified. Neves and I found evidence that 1C days of practice at reason giving had little 
positive transfer to to proof generation. However, that evidence was based on a very small 
sample of subjects and informal evaluation of their proof generation skills. 

Mckendree and Anderson looked at this issue more thoroughly in their study of LISP 

evaluation skills. To review, that experiment was concerned //ith having subiects evaluate 

the results of LISP functions applied to arguments. A typical problem was 
(LIST (CAR '(A B)) (B O) = ? 

o BEST COPY AVAILABLE 

ERIC - 25 



23 

where the correct answer is (A C)). In addition to masswe practice on these evaluation 
problems, cn the first and fourth day we gave subjects a few generation problems that were 
isomorphic to these evaluation problems. For tne example above, the isomorphic generation 
problem would require subjects would be asked to write a LISP expression that will operate 
on (A B) and (B C) and produce (A (B C)), for which the expression above would be the 
correct answer. 

There is again no overlap between productions that solve the evaluation problems and 
productions that solve the generation problem, although again these productions are based 
on the same abstract declarative knowledge. However, that declarative knowledge has 
different production form when it has been compiled for a particular use. For instance, 
consider the abstract fact that the LISP function LIST inserts its argument in a list. Used 

in evaluation form this is encoded by the production: 

IF the goal is to evaluate (LIST X Y) 
and A is the value of X 
and B Is the value of Y 
THEN (A B) is the value 

Jsed in generation form this is encoded by the production (actually taken from our LISP 

tutor): 

IF the goal is to code (A B) 
THEN use the function LIST and set as subgoals 

1. To code A 

2. To code B 

This is basically the same knowledge in different form, but the ACT claim is that the 
difference in form is significant, and there will not be transfer between the two forms. Most 
people find this prediction quite counterintuitive. Standard educational practice iat least at 
the college level) seems to believe that we should get students to understand general 
principles and how to apply them in one domain. The belief is that if they have this 
knowledge, they will know how to apply the principles in any domain. 
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The McKendree and Anderson experiment ailowei us to address the issue of whether 
massive practice on evaluation would generalize to generation. Averaging over single, pair, 
and triple function evaluations, subjects evaluation time dropped from 18.2 sec on Day 1 to 
10.0 seconds on Day 4. Subjects" accuracy increased from 58% correct to 85% correct 
over the four days that they were practicing evaluation, in contrast, they only changed from 
71% correct m the first generation test at the end of the first day to 74% correct in the 
second generation test at the end of the fourth day. (We don't have reaction time 
measures for the generation task because subjects had to w^^^k out the solutions by hand 
rather than typing them mlo a computer.) Thus, it seems that there is little transfer from 
evaluation to generation-two different skills that use the same knowledge. This reinforces 
the idea that production representation captures significant features of our procedural 
knowledge and that differences between production forms are psychologically real. 

1 think »n combination with the earlier mentioned results on transfer, a rather startling 
result is emerging. We can get total transfer across tasks If the same knowledge is used 
in the same way (e.g.. transfer between text editors ED and EDT). On the other hand, we 
get no transfer at all if the same knowledge is used in different ways (e.g.. transfer between 
evaluation and generation). 

Effects of Proceduralization 

Part of the compilation process is the elimination of the need to hold declarative 
information m working memory for interpretation, rather, that information is built mto the 
proceduralized production. There are a number of predictions about qualitative cnanges in 
skill execution which follow from proceduralization. Many of these are discussed in Anderson 
(1982). These changes mclude disappearance of veroahzation of the declarative knowledge 
and decrease in errors due to working memory forgetting. 
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Part of the Singiey and Anderson experiment was a very specific test of the 
qualitative changes implied by proceduralization. Before procedurallzatlon there should be 
declarative interference between performing a skill and remembering similar declarative 
knowledge, but this content-based interference should disappear when the knowledge is 
procedurallzed. In a dual-task experiment Singiey and Anderson required subjects to 
memorize facts being presented to them in the auditory modality and to perform text editing 
operations involving the visual modality. Wo varied the content of the information they were 
being asked to remember to either be the description of a hypothetical text editor or to be 
unrelated to text editing. The hypothetical text editing information started out poorly but 
improved wlt^ practice. Memory for the unrelated information did not start out so poorly but 
did not improve. This is consistent with the view that subjects were initially suffering 
de !arative interference with the text editor information from the execution of the text editor 
skill, which disappeared when the skill became procedurallzed with practice. 

Working Memory Limitations 

As we noted earlier, there is reason to believe that in domains such as writing 
computer programs, the major source of errors will be working-memory failures. The student 
gets good feedback on the facts about the programming language, and so we would expect 
misconceptions to be rapidly stamped out. On the other hand, the student is being asked 
to coordinate a great deal of .knowledge in writing a program, and one ^ou\6 expect 
working memory capacity to be frequently overwhelmed. 

Anderson & Jeffries (in press) looked at the errors made by over a hundred students 

in introductory LISP classes. We found that over 30% of all answers were errors. The 

results we obtained were consistent with the working memory failure analysis. Increasing me 

complexity of one part of the problem increased errors m another part, suggesting that the 

capacity requirements to represent one part overflowed and impacted on the representation 
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of another part. Further, errors were non-systematic, that is, subjects did not repeat errors 
as one would expect if there were some systematic misconception. 



Interestingly, the best predictor of individual subject differences in errors on problems 
that Involved one LISP concept was number of errors on other problems that involved 
different concepts. This m turn was correlated with amount of programming experience and 
experience with LISP. It seems that subjects differ in iheir general proneness to these 
working memory errors m a domain and that experience in the domain increases working 
memory capacity. Jeffries. Turner. Poison, and Atwood (1981) found that memory capacity 
was a major difference separating beginning programmers from experienced programmers. 
Chase and Ericsson (1982) showed that experience in a domain can increase capacity for 
that^ domain. Their analysis implied that what was happening is that storage of new 
information in long-term memory became so reliable that long-term became an effective 
extension of short-term memory. 

Consequences of Working Memory Failures 

Loss of declarative information from working memory can cause good procedures to 
behave badly. This can happen in a number of ways. Most simply, if one looses 
information that is needed in the answer, the answer will be missing this information. 
Interestingly. Anderson and Jeffries found that the most common error in calculating LISP 
answers is to drop parentheses. This error would occur if a parentheses were lost from the 
representation of the problem they were manipulating in working memory. 

This was more than twice as frequent as adding parentheses. A slightly more 
profound type of working memory error occurs when students loose a goal they are trying to 
achieve. Katz and Anderson (in preparation) found that a major error in syntactically correct 
programs was the ommission of parts of code that correspond to goals. Often subjects 
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ment{on these goals in protocols taken while writing the programs, but they just never get 
around to achieving them* 



A third Kind of error that can occur because of working memory loss is that subjects 
will loose a feature that discriminates between two productions and so fire the wrong 
production. A common example of this is in geometry where students will apply a postulate 
involving congruence of segments to information about equality of measure of segments. 
Students when queried can explain why they are wrong but there is just a momentary 
intrusion of the postulate. An example in the context of LISP occurs is the confusion of 
similar LISP functions in programs. Again subjects can explain their mistake when queried, 
but it seems tha: the production generating the wrong function intruded. Anderson and 
Jeffries have shown that the frequency of these intrusions can be increased by increasing 
the concurrent memory load on subjects. This is consistent with the view that discriminating 
information is being crowded out of working memory. 

Norman (1981) has provided a classification of what he calls action slips, which 
appear to be identical in reference lo our working memory errors. However, he interprets 
these with respect to an aciivation-based schema system rather than a production system. 
With respect to cognitive*level errors such as occur m LISP programming, there has hardly 
been adequate research to separate the two views. However, evidence that such errors 
increase with working memory load is certainly consistent with the working memory plus 
production system hypothesis. 

Immediate Feedback 

The importance of immediate feedback is an interesting consequence of the 
interaction between compilation and working memory limitations. For knowledge compilation 
to work correctly with delayed feedback, it is necessary lo preserve information about the 
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decision point until iniformation is obtained about the right decision. Then an operator can 

be compiled, attaching the correct action to the critical information at the decision point. 

Working memory failures will cause information about the decision point to be lost and 
consequently incorrect operators will be formed. Such failures will increase with the delay of 

the feedback. Thus, the commonly recommended practice of having subjects discover their 
errors may have a negative Impact on learning rate. 

Lewis & Anderson (1985) performed an experiment to see whether it was better to 
have subjects discover errors or :o inform them immediately that an error had been made. 
We had subjects learn to solve dungeon-quest-like games. Figure S shows a typical 
situation that a subject might face. A room is described with certain features and the 
Subject can try certain operators (like waving the wand, slaying the roomkeeper) which will 
move the subject on to other rooms. A subject might move to a room which leads to a 
deadend. In the immediate feedback condition subjects were immediately told they had 
entered such a room while in the delay condition they had to discover by further exploration 
that they had reached a deadend. 



Insert Figure 5 about here 

Lewis and Anderson found that subjects performed considerably better, m terms of 
number of correct moves, given immediate feedback. Other researchers iR. C. Anderson. 
Kulhavy, & Andre. 1972) have found that the type of feedback is critical to determining its 
effectiveness. It is important that the feedback not give away the correct answer but only 
signal to the learner that there is an error state. This is because the student must go 
through the process of calculating the correct answer, rather than copying it. if an effective 
operator is going to be compiled. If subjects copy .they will compile a procedure for 
copying which will only make them more efficient at copying answers and not at producing 
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them. 



The issue of immediate feedback versus iearning by discovery is quite controversial, 
ciiid I do not mean to impiy that immediate feedback is the perfect condition for aii types of 
learning. For instance, Lewis and Anderson found, reasonably enough, that subjects allowed 
to learn by discovery were belter able to recognize error states. Similarly, if one wants to 
learn a skill such as debugging computer programs, one might let students make errors in 
their programs so that there is something to debug. However, the theoretical point is that 
immediate feedback on an operator is important to learning the operator. Both in the error 
detection and the debugging examples, skill at successfully generating solutions is being 
sacrificed so that circumstances can be .created to facilitate learning skills of dealing with 
errors. 



This view of skill acquisition has some important implications for instruction. One 
straightforward observation is that in a system that can only learn skills by doing them, the 
importance of formal instruction diminishes and the importance of practice increases. There 
IS already reason to doubt the value of elaborate textual instruction for comm. \icating 
declarative knowledge (Reder and Anderson. 1980). Carroll (1985) has shown that standard 
text-editor manuals can be made more effective if they are greatly reduced in length to 
focus on lust the information necessary to perform the skill. His analysis of this phenomena 
is that a shorter text gives the student more opportunity to focus on practising the skill. 
Reder. Charney. and Morgan (in press; have shown that the only elaborations that are 
effective in a manual on personal computers are those that provide examples of how i6 use 
the personal computer. 

Much of technical instruction is not focused on telling students how to solve problems 



Further Implications for Instruction 
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in the technical domain but explaining why solutions work m the domain. This distinction 
might seem subtle, but it is important. A good example of this is recuf^ive functions in the 
language LISP. Most textbooks focus on explaining to students how recursive programs are 
evaluated in LISP to produce their results rather than how one thinks up a recursive function 
to solve a programming problem. The question naturally arises whether being told how 
recursive functions are evaluated or how recursive programs are created is more effective for 
the purpose of writing ecursive programs. 

There are analyses in Al (Rich & Shrobe, 1978; Soloway & Woolf, 1980) about how 
to write recursive programs. Pirolli and Anderson (1984) have developed an ACT production 
system model of recursive programming based on these analyses. We performed an 
experiment in which we either provided students with the standard instruction about how 
recursion is evaluated or with instruction about how to structure recursive code, based on 
our production system model. The standard instruction was modelled on existing textbooks. 
The how-to instruction basically described the productions in our model. 

We had students try to write a set of four simple recursive programs. Students given 
the how-to information took 57 minutes whereas students given the how-it-works information 
took 85 minutes. Of course, how-to and how-it-works information need not be mutually 
exclusive as they were in this experiment, but this experiment makes the point that 
instruction for a skill is most effective when it directly provides information needed m a 
production system model of that skill. 

Instruction m the problem^sclving context 

While the Plrolli and Anderson experiment confirmed that how-to instruction was better 
than conventional instruction, there is reason to question how effective it really was. Even 
the best designed instruction is given out of the context of the actual problem-solving 
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Situation and is often difficult for students to integrate properly into their problem-solvmg 
efforts. There are memory problems m retrieving what has been read In one context when 
It IS needed in another. Secondly, it is very easy to misunderstand abstract instruction. 
For instance, many high school students misunderstand the following statement of the side* 
angle-side postulate: 

"If two sides and the included angle of one triangle are congruent to the 
corresponding parts of another triangle, the triangles are congruent." 

They interpret "included" to mean included within the triangle rather than included between 
the two sides. This occurs despite the fact that the statement of the postulate is 
accompanied by an appropriate diagram. While this particular misunderstanding is so 
common one might imagine placing remedial instruction nght in the text one cannot . write 
into the textbook remedial instruction for all possible confusions. This is one cf the major 
advantages of private tutors-they can diagnose the misconceptions and provide the 
appropriate instruction in context. 

We (Anderson & Reiser. 1985) have performed a number of studies of our LISP tutor 
m which we contrast one group solving problems on their own with another group solving 
problems with the LISP tutor. We think the important aspect of the tutor is that it 
remediates students confusions by providing instruction m the context of these confusions. 
One study compared two groups of students who read instruction on recursion much like the 
instruction used in the original by Pirolli and Anderson study. However, one group had the 
LISP tutor which forced them to follov^ this advice and corrected any misconceptions they 
had about what that advice meant. The other group was left to try to use the instruction as 
they would-as m the original Pirolli and Anderson study. The tutor group took an average 
of 5.76 hrs to go through the problems and got 7 6 points on the recursion section of a 
paper-and-pencil final exam. The no-tutor group tooK 9.01 hrs and got an average of 4.8 
points. 
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In our view much of the advantage of intelligent computepbased tutors is their ability 
to facilitate the conversion of abstract declarative instruction mto procedures by providing 
that instruction appropriately in context. Of course, a prerequisite to doing this is to be 
able to correctly interpret the students behavior in terms of a cognitive model. Much of our 
actual efforts in intelligent tutoring have gone into developing such a cognitive model and 
developing techniques for diagnosing student behavior. 

Inductive Learning 

We have been discussing research that is relevant to the ACT* learning mechanisms 
of strengthening and compilation. In the original ACT* theory there were two other learning 
processes which were responsible for formation of new productions. These learning 
processes actually changed what the system did rather than simply made more effective 
existing paths of behavior. These were the inductive learning mechanisms of generalization 
and discrimination. In contrast to our success m relating strengthening and compilation to 
empir; al data, we have had a notable lack of success in our tests of ge'^eralization and 
discrimination. 

The mechanisms of generalization and discrimination can be nicely illustrated with 
respect to language acquisition. Suppose a child has. compiled the following two 
productions from experience with verb forms: 

IF the goal is to generate the present tense of KICK 
THEN say KICK + S 

IF the goal is to generate the present tense of HUG 
THEN say HUG + S 

The generalization mechanism would try to extract a more general rule that would cover 
these cases and others: 

IF the goal is to generate the present tense of X 
THEN say X + S 
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Discrimination deais with the fact that such rules may be overly general and need to 
be restricted. For instance, the rule above generates the same form independent of 
whether the subject of the sentence is singular and plural. Thus, it will generate errors. 
By considering different features in the successful and unsuccessful situations the 

discrimination mechanisms would generate the following two productions: 

IF the goal is to generate the present tense of X 
and the subject of the sentence is singular 
THEN say X + S 

IF the goal is to generate to present tense of X 
and the subject of the sentence is plural 
THEN say X 

These discrimination and generalization mechanisms are very much like similar knowledge 
acquisition mechanisms which have been proposed m the artificial intelligence literature (e.g.. 
Hayes-Roth & McDermott. 1976, S/ere, 1977). In particular they are called syntactic methods 
in that they only look at the form of the rule and the form of the contexts in which it 
succeeds or fails. There is no attempt to use any semantic knowledge about the context to 
influence the rules that are formed. A consequence of this feature in the ACT iheory is 
that generalization and discnmmation are regarded as automatic processes, not subject to 
strategic influences and not open to conscious inspection. The reports of unconscious 
learning by Reber (1976) seem consistent with irtis view. He finds that subjects can learn 
from examples whether strings are consistent with the rules of finite state grammars without 
ever consciously formulating the rules of these grammars. 

There are now a number of reasons (Anderson, m press) for questioning whether the 
ACT theory is correct in its position that inductive learning is automatic. First, there is 
evidence that the generalizations people form from experience are subject to strategic 
control (Elio & Anderson. 1984. Kline. 1983). In a prototype formation experiment. Elio and 
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Anderson (1983) showed that subiects could adopt either memorization or hypothesis 
formation strategies, and the two strategies led them to differential success, depending on 
what the instances were. Second, Lewis and Anderson found that subjects were able to 
restrict the application of a problem*solving operator ii.e.. discriminate a production) only if 
they could consciously formulate the discrimination rule, indeed, there is now reason to 
believe that even in the Reber unconscious learning situation subject? have conscious access 
to low level rules that help them classify the examples (Duiany, Carlson, and Dewey, 1984). 
They do not form hypotheses about finite state grammars but do notice regularities in the 
example sentences (e.g.. grammatical strings have two X's in second and third position). 

Another interesting problem with the ACT generalization mechanism is that subjects 

often appear to emerge with generalizations from a single example (Elio & Anderson, 1983. 

* 

Kleras & Bovair. 1985). while the syntactic methods of ACT* are intended to extract the 
common features of a number of examples. It is interesting to note that the coding of the 
LISP function FIRST at the beginning of this paper was based on extraction of 
generalizations by analogy from the single F-TO-C 'unction. By compiling that analogy 
process, the student simulation emerged with general production rules for the syntax of 
function definitions and for specifying a function argument withm a function-definition context. 
It is hard to see how one could extract a generalization from a single example without 
some semantic understanding of why the example worked. Otherwise, it »s not possible to 
know which features are critical and which features can be ignored (generalized over). 

This is an example of a more general point, which is the leaning of our current 
thinking (Anderson, in press) about induction. The actual process of forming a generalization 
or a discrimination can be modelled by a set ^ of problem-solvmg productions. The example 
above is a case where the problem*soiving method is analogy but other weak problem- 
solving methods can apply also. The mductive processes of generalization and discrimination 



ERIC 



37 



BEST COPY AVAILABLE 



35 

are things that can be implemented by a set of problem-solving productions. Also, because 
productions are sensitive to the current contents of working memory, induction can be 
influenced by semantic and strategic factors, as it apparently is. Since the information used 
by the productions has to be in working memory then people should have conscious access 
to the information they are using for induction, which they apparently do. Knowledge 
compilation can convert these inductive problem-solving episodes into productions that 
generalize beyond the current example. 

Thus, we see that there really is no logical need to propose separate learning 
processes of discrimination and generalization. Besides this argument of parsimony the 
empirical evidence does not seem consistent with such unconscious learning processes. On 
the other hand the empirical evidence does seem consistent with the unconscious 
strengthening and compilation processes. We have never observed subjects to report the 
changing strengths of their procedures nor compilation of productions. 

Conclusions 

The ACT theory contains within it the outline of an answer to the epistemological 
question. "How does structured cognition emerge?** The answer is that we approach a new 
domain with general problem-solving skills such as analogy, trial and error search, or means- 
ends analysis. Our declarative knowledge system has the capacity to store in relatively 
unanalyzed form our experiences in any domain, including instruction if tt is available, 
models of correct behavior, successes and failures of our attempts, etc. It is the basic 
characteristic of the declarative system that it does not require that one understand how the 
knowledge will be used before it is stored. This means that we can easily get relevent 
knowledge into our system but that considerable effort may have to be expended when it 
comes time to convert this knowledge to behavior. 
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The knowledge learned about a domain feeds the weak but general .problem«solving 
procedures which try to generate successful behavior in that domain. Knowledge compilation 
produces new production rules that summarize the outcome of the efforts and strengthening 
enhances those production rules that repeatedly prove useful. The knowledge*compilation 
process critically depends on the goal structures generated m the problem solving. These 
goal structures indicate how to fold the sequence of events into new productions. 

Most of our research has been looking at adults or near-adults learning very novel 
skills like LISP programming. However, the learning theory is intended to generalize 
downward to child development. Young children are the universal novices-everything they 
are being asked to learn is a novel skill. In looking at something like LISP programming 
we are just finding the best approximation to the child's situation m the experimentally more 
tractable adult. The claim would be that children bring weak problem solving methods to 
new domains and like adults eventually compile domain-specific procedures. Klahr (1985) 
has found evidence for early use of .veak methods !ike means-ends analysis and hillclimbing. 

Th^ theory of knowledge proposed here is not a extreme empiricist theory in that it 
requires that the learner bring some pnor knowledge to the learning situation. On the other 
hand, the amount and kind of knowledge bemg brought to bear is quite different than 
assumed in many nativist theories, tn fact, the real knowledge comes in the form of the 
weak but general problem-solving procedures that the learner applies to mittai problems m 
these domains. These procedures enable the mitial performance and goal structures that 
impose an organization on the behavior which allows the knowledge compilation process to 
Successfully operate. Given this organization, knowledge compilation is just a process of 
learning by contiguity applied to production systems. The goal structures provide the 
belongingness that Thorndike il935) saw as a necessary complement to learning by 
contiguity. 
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If this view of the development of knowledge Is correct, the relevant question 
becomes the nature and origin of the weak methods. Currently, they are modelled in ACT 
as production sets. Laird and Newell (1983) in their analysis of weak methods similarly 
model them. In ACT they are assumed as givens, whereas Newell and Laird treat them as 
the product of encoding the domain. Laird and Newell propose that people start out with a 
single "universal weak method" that a person starts out with that serves to interpret 
knowledge, encoded about a domain. This universal weak method is encoded as a set of 
productions. When the person encodes new information about a specific domain, this 
informatton is encoded by additional productions. (The Laird and Newell theory does not 
have declarative, non-production encodings.) The domain specific productions in the context 
of the universal weak method can lead ot other of the weak methods. 

Of course, much learning is not m completely novel domains. If the learner knows 
something relevent then the course of knowledge acquisition can be altered quite 
fundamentally. We saw how productions can transfer wholesale from one domain to 
another. Even if this is not possible one can use the structure of a solution in one domain 
as an analog for the .structure of the solution in another domain. Also one can use one's 
knowledge to interpret the declarative information and so store a highly sophisticated 
interpretation of the knowledge that is presented. We have been comparing students 
learning LISP as a first programming language versus students learning LISP after PascaL 
The Pascal students learn LISP much more effectively and one of the reasons is that they 
appreciate xh better the semantics of various programming concepts such as what a 
variable is. Weak problem-solving methods like analogy can be much more effective if they 
can more richly represent the knowledge, indeed. Pirotli and Anderson (1985) looked at the 
acquisition of recursive programming skill and found that almost ail students develop new 
programs by analogy to example recursive programs but that their success is determined by 
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how well they understand why these examples work. 



Appendix: Knowledge Compilation: Technical Discussion 

Th. description below describes knowledge compilation as it is implemented in the 
GRAPES production system (Sauers & Farrell. 1982), which is intended to embody certain 
aspects of the ACT* theory. Knowledge compilation consists of two subcomponents, 
procedura.Ization and composition. Procedurallzation eliminates reference to certain 
declarative facts by building into productions the effect of that reference. Composition 
collapses numbers of productions into one. These will be discussed separately. 

Procedurallzation 

Procedurallzation requires a separation between goal information and context 
information in the condition of a production. Consider the following production: 

G1 IF the goal is to create a structure 

"and there is a operation that creates such a structure 
and it requires a set of substeps 
THEN set as subgoals to perform those steps 

This is a classic working-backwards operator, an Instance of a general, weak, problem- 
solving method This production might apply If our goal was to insert an element into a Irst 
and we knew that there was a LISP function CONS which achieved this-i.e.. (CONS A (B 
C)) = (A B C). In this production, the first line of the condition describes the current goal 
and the subsequent context lines identify relevant information m declarative memory, 
Proceduralizatic.n eliminates the context lines but gets their effect by building a more specific 
goal description: 

01 IF the goal is insert an element into a list 

THEN write CONS and set as subgoals to 

1. Code the element 

2. Code the list 

The transition from the first production to the second is an example of the transition from 
domain-general to domain-specific productions. 
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To understand in detail how domain-generai productions appiy and how 
proceduraiization occurs, one needs to be more precise about the encoding of the 
production, the goai. and the l^nowiedge about CONS. With respect to the production we 
have to identify its variable components. Below is a production more like its GRAPES 

implementation where terms prefixed by '* denote variables: 

G1' IF the goal is to achieve » relation on ^argl and »arg2 

and » operation achieves » relation on s^argl' and »arg2' 
and requires alist as substeps 
THEN make >list subgoals 

The goal Is "to achieve insertion of argi into argS," and our knowledge about CONS is 

represented: 

CONS achi eves insertion of argument 1 into argument2, and it involves the substeps of 
writing "(." writing "JONS, coding argument!, coding argument2. and writing ")." 

The production G1* applies to the situation with the following binding of variables. 



s relation 
= operation 
sargi 
= arg2 
»argr 
= arg2- 
=>list 



insertion 
CONS 
argi * 
arg2 

argument 1 
argument2 

writing "(". writing CONS, coding argument 1. 
coding *argument2. and writing 



Thus the production would execute and set the subgoals of doing the steps m =list. 

The actual execution of the production required that the definition of CONS be held 

active in working memory and be matched by the production. This can be eiirYimated by 

proceduraiization which builds a new production that contains the relevant aspects of the 

definition within it. This is achieved by replacing the variables m the old production by what 

they matched to in the definition of CONS. The proceduraiized production that would be 

built in this case is: 

Dr IF the goal is to achieve insertion of =arg1 into =arg2 
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THEN make subgoals to 

1. write ( 

2. write CONS 

3. code aarg1 

4. code »arg2 

5. write ) 

or, as we generally write such productions for simplicity: 

Dl" IF the goal is to insert =arg1 into «arg2 

THEN write CONS and set as subgoals to 

1. code =»arg1 

2. code »arg2 

!n general, proceduralizatlon operates by eliminating reference to the declarative knowledge 
that permitted the problem solution by the weak-method productions and building that 
knowledge into ihe description of the problem solution. 



It would be useful to have a detailed analysis of procedurallzatlon of solution by 

analogy since this figured so prominently in the discussion of the paper. Usually, analogy is 

implemented in ACT by the operation of a sequence of productions (e.g., see the Appendix 

to Ch. 5 in Anderson. 1983), but the example here will compact the whole analogy process 

into a single production. 

G2 IF the goal is to achieve »relation1 on =:arg3 

and (a function =:arg4) achieves =»relation1 on =arg4. 
THEN write function =arg3) 

Suppose the subject wanted to get the first element of the list (a b c) and saw the 

example 'rom LISP that (CAR '(b r)) b. Then this production would code (CAR (a b c)) 

by analogy Proceduralization can operate by eliminating reference to the ueclarative source 

of knowiedge-in this case, the declarative source of knowledge is the example that (CAR (b 

r)) =» b. It would produce the new production: 

D2 IF the goal is to achieve the first-element of =:arg3 

THEN write CAR and set as a subgoal to 
1. code =arg3 
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Composition 

Composition (Lewis, 1978) is the process of collapsing multiple productions into single 
productions. Whenever a sequence of productions apply m ACT and achieve a goal, a 
single production can be formed which will achieve the effect of the set. Note that the 
existance of goals is absolutely critical in defining what sequences of productions to 
compose. 

While many times composition applies to sequences of more than two productions, its 
effect on longer sequences is just the composition of its effect on shorter sequences. 
Thus, if S1.S2 is a sequence of productions to be composed and C is the composition 
operator. C(S1.S2) = C(C(S1),C(S2)). So all we have to do is specify the pairwisa 
compositions. 

Let ''IF CI THEN Al", and "IF C2 THEN A2'' be a pair of productions to be 
composed where C1 and C2 are conditions and A1 and A2 are actions. Then their 
composition is "IF CI & (C2 - A1) THEN (A1 - G(C2)) & A2.'' C2-A1 denotes the 
conditions of the second production not satisfied by structures created in the action of the 
first. All the conditional tests in CI and (CI - A2) must be present from the beginning if 
the pair of productions are to fire. A1 - G(C2) denotes the actions of the first production 
minus the goals created by the first production that were satisfied by the second. 

As an example, consider a situation where we want to insert the first element of one 
list into a second list. The first production above would fire and write CONS, setting 
subgoals to code the two arguments to CONS. Then the second production above would 
fire to code CAR and set a subgoal to code the argument to CAR. Composing these two 
together would produce the production: 

C3 IF the goal is to insert the first element of =arg3 into =arg2 

• ^^r\^ TO^WEN code CONS and then code CAR and set as subgoals to 
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1. code aarg3 

2. code »arg2 

It is just productions of this form that we speculate subjects were forming in the McKendree 
and Anderson experiment described in the main part of this paper. 

Composition and Proceduralization 

Much of the power of knowledge compilation in the actual simulations comes from 
the combined action of proceduralization and composition together. The composed 
productions CI and C2 discussed with respect to Figure 1 earlier have such a history, A 
number of analogical productions had to be proceduralized and composed to form each 
production. Thus, in real learning situations we simultaneously drop out the reference to 
declarative knowledge and collapse many (often more than two) productions mto a single 
productions. This can produce the enormous qualitative changes we isee in problem solving 
with just a single trial. 
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Figure Captions 

Figure 1 A representation of the goal structure in the subject BR*s soiution to the 

probiem of writing the function FiRST, The boxes represent goals, and 
the arrows indicate that a production has decomposed the goal above into 
the subgoals below. Checks Indicate successful goals, and X's indicate 
failed goals. The dotted lines indicate parts of the goal three combined 
in composition. 

Figure 2 A representation of the goal structure set up by the production for text- 

editing in ED. Labelled are goals and goal structures in common with 
EOT or EMACS. Goal structures not in common with either are labelled 
0000. 

Figure 3 Transfer among text editors. 

Figure 4 Transfer between EMACS and Perverse EMACS. 

Figure 5 An example of the problem description used in the Lewis and Anderson 

(1985) experiment. 
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Table 1 



Sample material from McKendree and Anderson (in press) 



ITEM 



(CAR (CAR '((A B) (C D) (E F)))) - 
(CAR (COR '((A B) (C D) (E F)))) - 
(COR (CAR '((A B) (C D) (E F)))) - 
(COR (COR '((A B) (C D) (E F)))) - 
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GROUP 1 GROUP 2 

FREQUENCY FRfQUENCY 

12 4 
4 12 
4 12 

12 4- 
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Notes 

'This research is supported by Contract No. N00014-84-K-0064 from the Office of 
Naval Research and Grant No. IST-83- 18629 from the National Science Foundation. I would 
like to thank Al Corbett. David Klahr. Allen Newell, and Lynne Reder for their comments on 
this manuscript. % 

^The assumption, which I will elaborate later, is that the domain-general productions 
are innate. 

^The single quote in front of (LIST1) causes LISP to treat this as literally a list with 
the element LIST1. 
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100 




You are in a room which contains 

a bricl^ fireplace 
afoul-smelling troll roomkeeper 
a polished brass door 

You hear wicked laughter 

The room also contains hairy spiders 

The room is musty and dusty 

You also notice burning torches 

... What do you want to do? 
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